Generalized Parallel Join Algorithms and Designing Cost Models

نویسنده

  • Alice Pigul
چکیده

Applications for large-scale data analysis use such techniques as parallel DBMS, MapReduce (MR) paradigm, and columnar storage. In this paper we focus in a MapReduce environment. The aim of this work is to compare the different join algorithms and designing cost models for further use in the query optimizer.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Hash-Based Join Algorithms for a Shared-Everything

We analyze the costs, and describe the implementation, of three hashed-based join algorithms for a general-purpose shared-memory multiprocessor. The three algorithms considered are the Hashed Loops, GRACE and Hybrid algorithms. We also describe the results of a set of experiments which validate the cost models presented and demonstrate the relative performance of the three algorithms.

متن کامل

An Optimal Skew-insensitive Join and Multi-join Algorithm for Distributed Architectures

The development of scalable parallel database systems requires the design of efficient algorithms for the join operation which is the most frequent and expensive operation in relational database systems. The join is also the most vulnerable operation to data skew and to the high cost of communication in distributed architectures. In this paper, we present a new parallel algorithm for join and m...

متن کامل

Parallel processing of "group-by join" queries on shared nothing machines

Abstract: SQL queries involving join and group-by operations are frequently used in many decision support applications. In these applications, the size of the input relations is usually very large, so the parallelization of these queries is highly recommended in order to obtain a desirable response time. The main drawbacks of the presented parallel algorithms that treat this kind of queries are...

متن کامل

An Efficient Parallel Join Algorithm Based On Hypercube-Partitioning

Many parallel join algorithms have been proposed so far but most of which are developed focused on minimizing the disk I t 0 and CPU costs. The communication cost, however, is also an important factor that can significantly affect the join processing performance in multiprocessor system. In this paper we propose an eficient parallel join algorithm, called CubeRobust, for hypercube multicomputer...

متن کامل

Clone Join and Shadow Join: Two Parallel Algorithms for Executing Spatial Join Operations

With the growing popularity of spatial applications, there has been a signiicant increase in the use of database systems for storing and querying spatial data. Spatial data is now readily available from a variety of sources including government mapping agencies, commercial sources, satellite images, and simulation outputs. As this trend continues, applications continue to execute increasingly c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012